Failover

In computing, failover is automatic switching to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active application,[1] server, system, or network. Failover and switchover are essentially the same operation, except that failover is automatic and usually operates without warning, while switchover requires human intervention.

Systems designers usually provide failover capability in servers, systems or networks requiring continuous availability and a high degree of reliability.

At server level, failover automation usually uses a "heartbeat" cable that connects two servers. As long as a regular "pulse" or "heartbeat" continues between the main server and the second server, the second server will not initiate its systems. There may also be a third "spare parts" server that has running spare components for "hot" switching to prevent downtime. The second server takes over the work of the first as soon as it detects an alteration in the "heartbeat" of the first machine. Some systems have the ability to send a notification of failover.

Some systems, intentionally, do not failover entirely automatically, but require human intervention. This "automated with manual approval" configuration runs automatically once a human has approved the failover.

Failback is the process of restoring a system, component, or service in a state of failover back to its original state (before failure).

The use of virtualization software has allowed failover practices to become less reliant on physical hardware.

Contents

Failover types

Failover in disaster recovery

There are two types of failover:

  1. Automatic failover: Automatic ERSON-Failover where two servers are located in two different geographic locations. If disaster happens at host site, the secondary server will take over automatically without user or support intervention. In this case, usually, they have online data replication from host to the surviving recovery site, or using clustering technology to failover to secondary server. Of course, there are also other high-availability technologies such as hyperV or VMware, which cause a very minimum interruption and business can resume as normal. This solution is primarily used for high-reliability/critical applications or systems.
  2. Manual failover: In this case, user or support team intervention is necessary. For example, if an abnormality occurs at a host site, the support team has to restore the database manually at the surviving site, then switch users to the recovery site to resume business as usual. This is also known as a backup and restore solution, which is usually used for non-critical applications or systems.

Fallback in disaster recovery

There are also two types of fallback:

  1. After a term that is used for any disaster recovery test that failed, the fallback or revert will take place
  2. After recovery is completed, fallback or back to normalcy take place. Failback is the term that is actually used for fallback, but failback means that there are two recovery sites. In other words, this is the second disaster recovery site.

In short,

  1. Failover (Automatic or manual) - from host to recovery site
  2. Fallback (Automatic or manual) – from recovery site to host
  3. Failback (Automatic or manual) – from recovery site 1 to recovery site 2

See also

References

  1. ^ For application-level failover, see for example Jayaswal, Kailash (2005). "27". Administering Data Centers: Servers, Storage, And Voice Over IP. Wiley-India. p. 364. ISBN 9788126506880. http://books.google.com/books?id=W48oOMKU0RIC&pg=PA364#v=onepage&q=&f=false. Retrieved 2009-08-07. "Although it is impossible to prevent some data loss during an application failover, certain steps can [...] minimize it." .